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ABSTRACT 


Attrition from the Navy’s Delayed Entry Program (DEP) and attrition from 
Bootcamp are costly phenomena. The Commander of Naval Recruiting (CNRC) and 
Center for Naval Analysis (CNA) have periodically modeled both DEP and Bootcamp 
attrition with logistic regression. This thesis analyzes current data provided by CNRC and 
CNA. Both DEP and Bootcamp attrition are modeled using logistic regression and tree- 
structured classification. For DEP, the logistic model indicates that individuals who 
accept incentives prior to enlistment (i.e., Navy College Fund or Enlisted Bonus 
Program) and individuals who change enlistment programs (while in DEP) have a 
significantly lower propensity to attrite from DEP than others. The DEP tree model 
indicates that an individual with a low Armed Forces Qualification Test (AFQT) score, 
no high school diploma and a long scheduled DEP duration has a 97% probability of 
attriting. For Bootcamp, the logistic model indicates that individuals who use tobacco 
products, individuals who do not exercise, and individuals that have criminal waivers 
have a significantly higher propensity to attrite than others. The Bootcamp tree model 
shows that smokers and individuals with low AFQT scores have higher propensities to 
attrite than others. The models are tested using random partitions and this analysis 
shows that all of the models predict poorly at the individual level, despite strong 


statistical significance. 
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EXECUTIVE SUMMARY 


The accession of quality personnel continues to be a challenge for the Navy. 
Strong economic growth and low unemployment have decreased the pool of potential 
recruits and the Navy is having difficulty meeting its recruiting goals. The situation is 
exacerbated by a dwindling budget. The Navy is confronted with the challenge of doing 
more with less and must constantly find areas where financial savings are possible. 

Attrition, in both the Delayed Entry Program (DEP) and in Bootcamp, is one such 
area. It costs the Navy an average of $6500.00 per person to recruit an individual and an 
average of $1200.00 to begin their training in Bootcamp. These cost estimates aggregate 
the costs of testing, physical examinations, recruiter effort, DEP maintenance, shipping to 
Bootcamp and initial Bootcamp screening. An average of 19% of the individuals who 
enter DEP attrite, while an average of 13% of the individuals who enter Bootcamp attrite. 
DEP and Bootcamp attrition cost the Navy upwards of $139,000,000.00 per year (based 
on a shipping goal of 55,000 new recruits). 

Attrition has been the focus of numerous studies, most of which predicted the 
probability of attrition as the dependent variable in a multivariate logistic regression 
model. This thesis analyzes attrition as a dependent variable using logistic regression and 
also models the probability of attrition using eeu’ classification. Tree- 
structured classification is an effective alternative to logistic regression and often 
provides insight into the data which is not discernible with the logistic models. 

The data used for this thesis were provided by CNRC, Code 20, and represented 
every individual scheduled to report to Bootcamp between October 1995 and December 


1997. There were 130,486 records in the data set. For the analysis, the data are 


Xill 








randomly partitioned into sets for building DEP and Bootcamp models and sets for 
testing the models. Further, since the ouput of both the logistic regression models and the 
classification tree models is a “probability of attrition”, an optimal decision criterion (for 
scoring a fitted value as an attrite) is developed. This threshold is used to test the 
predictive power of each model. 

Several significant factors are found with the logistic models. For DEP attrition, 
the factors that increase the probability of attrition with an increase in their value are age, 
race (white or black), Government Equivalency (GED) high school diplomas and 
scheduled DEP duration. The factors that decrease the likelihood of attrition with an 
increase in their value are Armed Forces Qualification Test (AFQT) score, sex (male), 
accepting incentive programs (Navy College Fund or Enlisted Bonus), enlisting as a 
senior in high schoo] and changing programs while in DEP. 

For Bootcamp attrition, the logistic models indicate that the probability of attrition 
increases with increases in age, race (white and black), GED high school diplomas, 
waivers (crime and other), tobacco use and program changes. The factors that decrease 
the probability of attrition with an increase in their value are AFQT score, long DEP 
duration, and exercise (running or jogging at least three times a week). 

The tree models identify several interesting relationships. First, the DEP tree 
shows that individuals who enlist as seniors but do not graduate from high school or 
graduate with a GED have a 98% chance of attriting. Second, individuals with no high 
school degree and an AFQT score below 49.5 who do not enlist as seniors in high school 
have a 76% chance of attriting. Third, individuals who do not graduate from high school, 


have an AFQT score below 49.5 and are scheduled for long DEP durations have a 97% 
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chance of attriting. The Bootcamp tree identifies smoking and low AFQT scores as 
increasing the probability of attrition. The trees reveal structure within the data which is 
not identified through logistic regression. 

Once the models are constructed, they are tested using the random partitions 
mentioned earlier. The DEP tree node, with a 98% attrite probability (mentioned above), 
correctly predicts 3954 attrites, while the DEP logistic model predicts only 71. Both of 
the Bootcamp models predict poorly. Further analysis of the DEP tree node with 3954 
correct predictions reveals that the educational codes of individuals who quit from the 
DEP are suspect and the tree’s predictive power should be scrutinized. 

Many of the predictive factors found in this analysis have been identified in 
previous research, but the classification methodology identifies several interesting 
relationships not previously documented. All of the models have strong statistical 
significance and weak predictive performance. Policies that exclude individuals, based 


on these results, are not recommended. 
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I. INTRODUCTION 


A. BACKGROUND 


Technological advancements in both modern warfare and its strategies have 
enabled the Navy to reduce its force structure while maintaining operational readiness. 
Despite all of the new hardware and software, the key asset remains people; it is naval 
personnel who man the high-tech workstations and the ships at sea. 

Naval personnel needs are met with an all-volunteer force that is either actively 
recruited by representatives of the Naval Recruiting Command (CNRC) or accessed 
through one of the Officer Programs. Since the Navy is all-volunteer, it is competing in 
the domestic job market with both the other branches of the military (Army, Air Force, 
Marine Corps) and the civilian sector. This dependence upon the job market for personnel 
subjects the Navy to the same economic forces as corporate America. For example, when 
unemployment is high, it is much easier for the Navy to recruit than when it is low. 
Currently, the United States is experiencing a 20-year low with respect to unemployment 
while the Navy is having difficulty meeting its recruiting goals and many fleet units are 
undermanned. 

There is more to both manning and recruiting difficulties than the unemployment 
rate. The fiscal constraints that accompany the mandated reduction in forces and the 
changing roles of the military have been mentioned as possible causes of the difficulties 
(CNRC, Code 20, 1997). Given the changing environment, the Navy must continuously 


review its manpower policies and find areas with potential for improvement. 


One of these areas is attrition, the unplanned loss of individuals who have 
promised to join or are already in the Naval Service. Thirty two percent of the 
individuals who initially sign contracts attrite before their fleet service begins. These 
attrition losses inflate goals and quotas and waste assets because the Navy expends 
resources when recruiting and conducting initial skill training. This thesis analyzes the 


attrition phenomenon. 


B. THE RECRUITING PROCESS 

For the purposes of this paper, the recruiting — is defined as “Enlisted 
Recruiting.” “Officer Recruiting” will not be included in this analysis. 

1. Setting Goals and Quotas 

The recruiting process is driven by congressional mandates and fleet needs. 
Congress, after reviewing budgetary and strategic considerations, sets the force size in 
terms of numbers of personnel required to fill each pay-grade within the naval force 
structure. This set of numbers is a target, which must be maintained within 1% (CNRC, 
Code 20,1997). Given the congressional requirements, the Bureau of Naval Personnel 
(BUPERS) is charged with continuously analyzing the status of forces to determine 
accession requirements. Figure 1 summarizes the goal/requirements process. 

BUPERS answers fleet needs generated by the various Operational, 
Administrative and Training Commanders (represented in Figure 1 as fleet units). Each 
of these Commanders has actual billets (or jobs) authorized within the force structure. 
For example, an aviation squadron with sea-going detachments may be authorized eight 
aviation electricians below the pay-grade of E-5 (Petty Officer, Second Class); if the 
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Figure 1. Process Overview 


billets are not completely filled, the Commanding Officer will request additional 
personnel via BUPERS. BUPERS will weigh this request with the requests of other 
Commanders and with the overall status of forces. BUPERS will then either fill or “gap” 
the billet (gapping a billet implies that the billet will remain vacant until a suitable 
replacement is identified). Not every fleet need is planned for; sailors may separate from 
service for disciplinary reasons or new operational requirements may arise. In any case, 
if BUPERS elects to fill the billet, it ag several choices. 

First, an individual already in service may fill the billet. Depending upon the 
nature of the vacancy, this may warrant gapping another Commander’s unit. For 


example, if a sea-going detachment from the aviation squadron needs an electrician for a 





detachment departing for a regional conflict, BUPERS might transfer an electrician from 
a non-deploying aviation unit. 

Second, BUPERS may identify an individual who is currently in the training 
pipeline to fill the billet. This transfer will occur at the completion of training. In this 
instance, a member of a training class with an appropriate graduation date is selected 
rather than a specific individual. The third method is to recruit a new individual. This 
method transfers the requirement to the recruiting command. These three methods 
require increasingly longer periods of time to fill the billet. 

In each case, it takes some unspecified period of time before the Commander has 
his billet filled. If the need is not planned for, and the only way to fill the billet is with a 
new recruit, it will take at least three months (in the case of a non-rated sailor) and may 
take as long as two years (in the case of a nuclear power plant technician) to fill the billet. 
For an aviation electrician, the process would take approximately eight months. Planning 
for these needs is critica] in maintaining fleet manning levels. 

BUPERS employs an array of planning models that forecast these fleet 
requirements. The specific models are beyond the scope of this paper but it suffices to 
say that they help the community managers within BUPERS balance the fleet needs and 
congressional mandates by using historical data. The end result is that the community 
managers generate quotas for new accessions. The quotas are rating, month, and gender 
specific (e.g., the Navy may need 460 male aviation electricians to enter bootcamp in 
April). These quotas are designed to get individuals into the training pipeline to meet 
fleet requirements in the future. Filling these quotas is the responsibility of CNRC. 
CNRC analyzes the quotas and incorporates additional congressional mandates. For 
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example, quotas are often sub-categorized by CNRC to include race and educational 


background. CNRC divides the quotas into goals for each of its recruiting areas. 


UInterview/T esting 
ee 


Classification 








Accession/Bootcamp 
A-School Fleet —_ —- 


Figure 2. The Recruiting Process 





The goals are ultimately transferred to the recruiting districts and the individual 
recruiters. There are approximately 3500 active recruiters in the Navy, with an aggregate 


| goal of approximately 55,000 new recruits (for FY 1998). Simple analysis shows each 





recruiter should send an average of 1.3 new recruits to bootcamp per month. At the 





recruiter level, the quotas are specified with respect to race, educational background and 
gender and individual recruiter goals reflect the demographics of the recruiting region. 


For example, at a given instant in the Seattle recruiting district there may be only two 





slots for female aviation electricians for the month of May. Such restrictions, combined 


with the management practices of the districts, may yield individual recruiter goals as low 
as one new recruit per month or as high as five new recruits per month. 

2. Recruiting 

The transition from civilian life to naval service is a complex process for the 
majority of accessions. This process is summarized in Figure 2. Armed with quotas, 
field recruiters seek to contact as many potential recruits as possible. Some interested 
individuals simply walk into a recruiter’s office; others may fill out the information page 
on the Navy’s website and be directly called by a recruiter. Many initial contacts come 
from recruiter presentations to local high schools and community colleges. The goal of 
the contact phase is to generate interviews. 

The interview is where the prospective recruit (prospect) sits down with the 
recruiter to get the sales pitch. This pitch describes all of the possible opportunities 
(within the Navy) available to a new recruit. This is also the first opportunity for the 
recruiter to query the individual. The recruiter may directly ask the individual about past 
drug use, legal problems, or other barriers to recruitment. 

If a qualified recruit remains interested, he or she may then be scheduled for the 
Armed Forces Qualification Test (AFQT). The AFQT is a standardized test designed to 
evaluate an individual’s cognitive abilities and to determine the military tasks in which he 
or she might excel (if any). It is scored on a percentile scale from 1 to 99, with 99 being 
considered outstanding (CNRC, Code 20, 1997). After the interview and AFQT, a 


recruiter may do an initial classification of the individual by using the CNRC recruit 


quality matrix (RQM), which is depicted in Figure 3. 
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Figure 3. Recruit Quality Matrix (Courtesy of CNRC, Code 20) 


The left side of the matrix shows AFQT scores; breakpoints are indicated in the 
picture. These scores are used to categorize prospects according to Test Score Category 
(TSC). Each prospect falls into a cell based upon TSC and his or her educational level. 
An individual who falls in cell A is highly desirable while an individual who falls in cell 
D is accepted only when severe recruiting shortages occur. There are mandated 
percentage limits on the maximum number of individuals from certain cells who may be 
recruited during normal operations. 95% of the total accessions must be high school 
graduates (this is more stringent than the congressional mandate of 90%), with 65% from 
category III-A or above (BUPERS LTR, 15 Jul 1997). 

If the prospect is found to be qualified he or she will then be scheduled for a 
physical examination. Physicals are conducted at the Military Entrance Processing 


Stations (MEPS) located throughout the country. If something wrong is apparent during 
f 


the physical, the individual may be disqualified or a waiver package may be submitted by 
the recruiter. Upon completion of the physical, the qualified prospects proceed to 
classification. 

During classification, a qualified prospect sits down with a classifier who weighs 
Navy needs for specific rates (the quotas) with the desires, test scores and academic 
credentials of the individual. For example, if the Navy has slots available for aviation 
electricians in June and the individual wants to be an aviation electrician the classifier 
will, generally, fill the slot with the individual. However, even if there is an opening for 
an electrician and it is the individual’s first choice, there may be an urgent need for 
another rate (e.g., nuclear power technicians). If the individual is also qualified for this 
billet, the classifier may try to sell it to the prospect. If the prospect does not seem 
interested, the classifier can offer incentive packages. The two prime incentive plans are 
The Navy College Fund and The Enlisted Bonus Program. 

The Navy College Fund (NCF) provides $30,000.00 to $40,000.00 for college to 
qualified individuals who successfully complete training in the specified field. For 
example, in the Nuclear Field, the Navy will pay $40,000.00 and for Aviation 
Electronics, $30,000.00. The Enlisted Bonus Program (EB) provides cash ranging from 
$1000.00 (Aviation Electricians) to $12,000.00 (Nuclear Field) for those who complete 
training. (CNRC, Code 20, 1997, BUPERS MSG DTG 091131Z Dec 1997) A 
prospective recruit may choose one, but not both, of these plans. 

Classifiers do whatever they can to funnel individuals to the proper pipelines but 
will not do so at the expense of losing the recruit. If a prospect is qualified then he or she 
may be enlisted with no job assignment. In this case, classification is delayed and the 
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enlistment still occurs. Once the classification phase is complete the individual is 


enlisted in the Naval Reserve until he or she ships to bootcamp. The enlistment often 
occurs immediately following classification, which is usually the same day as the 
physical. 

The final category of personnel to be discussed is those who qualify for waivers. 
In each of the previous phases, interviews and physicals may have found some trait or 
historical fact that makes the recruit generally unacceptable. In these cases, the recruiter 
may apply for a waiver of standards for the individual. CNRC evaluates these waivers on 
a case-by-case basis and may deem the candidate qualified. Waivers for prior drug use, 


physical impairments and prior legal problems are common. 


C. ENLISTMENT 

1. Delayed Entry Program 

After enlistment, recruits take one of two paths. If scheduled to begin bootcamp 
within 30 days, they are categorized as direct shippers and simply wait to be shipped to 
bootcamp. If they are not scheduled for bootcamp within 30 days, they enter the Delayed 
Entry Program or DEP. Individuals in the DEP attend monthly meetings and are tracked 
by their recruiter or a recruiting representative. While in DEP, they are expected to 
exercise and prepare for bootcamp but are not formally required to do anything. DEP is 
the first place in which qualified individuals attrite. Generally, the individual simply fails 
to report to bootcamp or quits, but a variety of other reasons have been identified. The 
categories in Figure 4 represent aggregates of the actual DEP attrite codes furnished by 
CNRC. The data was a set of 21332 DEP attrites (out of 112275 contracts) who dropped 
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out between July 1995 and October 1997. “Admin” attrites reflect individuals who left 
DEP due to administrative errors such as a change in his or her bootcamp shipping date 
or reclassification due to the needs of the Navy. The “Drugs/Alcohol” attrites represent 
individuals who failed urinalysis or had alcohol addiction problems. The “Medical” 


attrites represent those who had unwaiverable medical problems such as Crone’s disease. 


Drug/Alcohol 
4% 
Technical 


17% 
Screen 


12% 


Failed to Obligate 
48% 





Figure 4. DEP Attrition Breakdown 
The “Failed to Obligate” attrites simply quit. The “Screen” attrites represent 
individuals who had unacceptable and unwaiverable behavior in their past which was not 
discovered until! DEP service began; quite often legal trouble falls into this category. 
Finally, the “Technical” category represents those individuals who became ineligible 
during DEP; pregnancy and death are included in this category. A complete breakdown 


of the aggregate categories and their associated attrition reasons can be found in 
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Appendix A. On average, 19% of the individuals who enter the DEP never entered 
bootcamp. 

2. Indoctrination Training 

Those individuals who do not attrite from the DEP ship to bootcamp. Bootcamp 
is conducted at the Recruit Training Center Great Lakes, Illinois (RTC). Indoctrination 
begins with a thorough medical screening, which includes urinalysis. While in 
bootcamp, recruits are volunteers and may quit at any time. 

Indoctrination training is scheduled for eight weeks and ends by attrition or 

graduation for each individual. Upon graduation, the new recruit may either proceed to 
skills training (referred to as A-School) or directly to the fleet (if no skills training is 


required). If the individual attrites, he or she is sent home. 


Academic 
0.5%. 


Behavior 
ei” 25% 
Technical 


0.5% 





Figure 5. RTC Attrition Breakdown 
Reasons for bootcamp attrition are as varied as those for DEP attrition and are 
summarized in Figure 5. The categories in Figure 5 represent aggregates of the RTC 


attrite codes used by the staff in Great Lakes. The “Academic” category represents 
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academic failure in the course work during the training program (including language 
deficiencies). The “Behavior” category describes actions by an individual during training 
which are not consistent with military service (e.g., sleepwalking, suicidal behavior, 
bedwetting). “Screen” encompasses the prior problems that were not evident during the 
recruiting process (e.g., failing the indoctrination urinalysis). The “Admin”, “Technical” 
and “Medical” categories are similar to those described in the DEP attrition description. 
A complete breakdown of the aggregate categories and their associated attrition reasons 
can be found in Appendix B. On average, 13% of the individuals who entered bootcamp 


failed to graduate. 


D. COST ESTIMATION 

1. Recruiting Costs 

Estimating the cost expended on each recruit can be broken down into two 
distinct parts. The first estimate covers the recruiting process while the second process 
estimates the costs associated with shipping and bootcamp. CNRC derives the first 
estimate with the Planned Resource Optimization Model (PRO model) developed by 
Schmitz and Reinert (1995); Figure 6 summarizes the model. 

The PRO model is designed to “estimate the costs of recruiting different types of 
individuals under different market conditions”(Schmitz and Bohn, 1996). Additionally, it 
provides CNRC with an optimal resource allocation schedule and a “recruits per 
recruiter” goal schedule. Using this model with input parameters from February 1998 
(unemployment rate, current number of recruiters etc.), sensitivity analysis for various 
hypothetical attrition rates was performed. The results are summarized in Table 1. 
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The cells in Table 1 represent, in thousands of dollars, the cost to recruit an 
individual of a given cell type under varying hypothetical attrition rates ranging from 


19% to 0%. For example, when attrition decreases from 17% to 15%, the cost to recruit 
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Figure 6. Planned Resource Optimization Model 
A-cell individuals drops from $6900.00 to $6700.00. With the current state of 
unemployment (20 year low), it makes sense that it is more expensive to recruit talented 
A-cell individuals than B-cells, as the former can more easily find employment in the 
civilian sector. The second highest recruiting cost is C-cell individuals; this is attributed 


to their higher than average attrition rate, which drives their relative costs up in the PRO 


model. 
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Table 1. Cost Per Recruit/DEP Attrition Percentage 
Cell Hypothetical Attrition Rates 
$ x 1000 15% 13% 11% 


A-Cell 





B-Cell 
C-Cell 


From a cost standpoint, the $200.00 savings (per A-cell recruit) realized when the 
attrition rate is reduced from 17% to 15% results in a potential cost savings of 
$7,000,000.00 per year ($200.00 * 35,000 A-Cells =$7,000,000.00). The cost incurred 
during the recruiting process must also include DEP management costs. CNRC estimates 
that current management practices, which involve monthly contact and special events, 
result in a $50.00 per month expenditure per recruit (Schmitz and Bohn, 1996). 

2. Bootcamp Costs 

Jacklich (1998) recently estimated the costs associated with sending an individual 
to bootcamp. Individuals who fail the initial drug screening spend an average of nine 
days at Great Lakes. The nine day average cost (food, lodging, clothes, etc.), when 
combined with the cost of the plane ticket to RTC and the bus ticket home, results in an 
expenditure of $1200.00 per attrite. Depending upon the geographical origin of the new 
recruit, this amount can be as low as $900.00 and as high as $1500.00 (Jacklich, 1998). 
Analysis of RTC attrition data indicates that the average amount of time all attrites 
(including drug attrites) spend in RTC is 12 days but Jacklich’s cost estimate is a useful 
lower bound. 

Using Jacklich’s estimate, sensitivity analysis with respect to varying attrition 


rates was performed. A 1.0% decrease in the RTC attrition rate increases the average 
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number of RTC completions by 583 recruits per year. Weighting this estimate by RQM 
cell type and multiplying by the relative costs yields the results summarized in Table 2. 


Table 2. RTC Savings 


Parameter/Cell A-Cell B-Cell C-Cell 


Number Recruits 350 29 204 583 (Average) 


Cost Multiplier** | $5,900.00 $4,700.00 $5,600.00 N/A 
Savings (in Millions) $2.06 $0.14 $1.14 $3.34 
**Based upon Zero DEP attrition (hypothetical lowest cost) 





E. PREVIOUS RESEARCH 

The high cost of recruiting an individual and sending him or her to bootcamp 
illustrates the need for minimizing unplanned losses. The issue is not new; it has been 
the focus of numerous studies. This section summarizes some of the prior research. 

In 1995, Martin published a dissertation analyzing Army Attrition. He modeled 
first term attrition ashe contingency tables and logistic multiple regression models. 
Once the models were built they were tested with a range of “goodness of fit” 
diagnostics. Prior to modeling, Martin partitioned his data into two sets, one to build the 
model and one with which to test it. This process was designed to avoid over-fitting. His 
results broke individuals into two groups: high-risk and low-risk. Included in the high- 
risk category were overweight males, males with a history of problems with civil 
authorities, enlistees who signed up to “change their life,” and high school drop outs 
(non-grads). Included in the low-risk category were minorities, females over 21 years of 
age, male college graduates, individuals with an AFQT over 65, and individuals who 
indicated they were interested in advanced education (Martin 1995). 

Another study was a thesis by Murray (1985) which studied DEP attrition for the 


Navy. Murray employed several logistic regression models in an effort to predict DEP 
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attrition. She found that non-grads, individuals with high AFQT scores (above 65), 
individuals with long DEP stays (over 7 months), and individuals over 21 had a higher 
propensity to attrite (Murray 1985). Her findings contrast with those of Martin. 

Matos conducted a third study in 1994. Matos analyzed the Navy’s Delayed 
Entry Program and the effects of an individual’s time spent in DEP. He employed a log- 
linear regression model, contingency tables and conditional probability theory to describe 
DEP attrition. His finding concluded that an increase in DEP length resulted in an 
increase in DEP attrition but decreases in fleet first-term attrition. He also found that 
non-grads had a higher propensity to attrite than high school graduates. (Matos 1994). 

Bohn and Schmitz published an analysis in 1996 for CNRC. This study used 
OLS regression and logistic regression models to analyze DEP and RTC attrition factors. 
Bohn and Schmitz subdivided the recruit pool into two categories, work force and high 
school seniors. They assert that an individual who is recruited directly from the work 
force is different than an individual who signs up as a senior in high school. In the DEP 
analysis, they found that AFQT was inversely related to attrition, that seniors with 
dependents are more likely to attrite than those without, that Hispanics are more likely to 
attrite than non-Hispanics, that age is directly correlated with attrition, and that long DEP 
time leads to higher attrition among women. In the RTC analysis, Bohn and Schmitz 
found that non-grads have a higher likelihood to attrite, that AFQT scores are inversely 
related with attrition and that older individuals have a higher propensity to attrite. Bohn 


and Schmitz also formulated an optimization model for DEP duration, which minimizes 


DEP attrition and RTC attrition. 











Finally, there is an analysis published by Quester, Macllvaine and Barfield in 
1997 for The Center for Naval Analysis (CNA). Their study used OLS regression and 
descriptive statistics to analyze RTC attrition. This analysis incorporated data from a 
new survey (known as the SHIP survey) which is being administered to all accessions at 
bootcamp. This survey added possible predictors such as smoking and exercise to the 
data set and subsequent analysis. The study reported that non-smokers, A-cell 
candidates, Asians, recruits with no enlistment waivers and recruits who accessed 
through the DEP (rather than shipping directly to RTC) were less likely to attrite than 
others. (Quester et. al 1997) | 

The review of previous studies shows some common threads in analysis and 
aie Most previous research has employed logistic regression and most previous 
research found that A-cell candidates, candidates with some DEP exposure, and - 
minorities —_ less likely to attrite. The availability of the new SHIP data enabled 
Quester et al. to explore many other potential predictors with interesting results. The 


SHIP data (updated through Dec 1997) were available for this study. 


F. RESEARCH GOALS/ HYPOTHESES 

Starting with previous research and using both CNRC personnel data and CNA 
SHIP data, this paper will try to further explain DEP and RTC attrition. The analysis will 
employ logistic (logit) regression techniques for comparison but will focus on 
classification tree methodology as a means to explain the attrition data. Specific 


hypotheses are that: 


e Individuals who smoke and do not exercise have a higher propensity to attrite 
from RTC; 

e A-Cell individuals have a lower propensity to attrite from both DEP and RTC; 

e Individuals who sign up for the Navy College Fund or EB program have a 
lower propensity to attrite from DEP and RTC. 

Given the above hypotheses, this analysis also has the following research goals: 

e To identify, post hoc, other significant predictive factors (not found in 
previous research); 

e To compare and contrast the logit regression and the classification tree 
methodologies for this type of data set; 


e To address the policy implications of the resulting predictive model. 
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Il. METHODS 


A. DATA 


Data for this analysis were provided by CNRC, Code 20, who merged data 
from CNA, RTC and CNRC databases. The data consisted of all individuals who were 
scheduled to report to RTC between October 1995 and December 1997. There were a 
total of 130,486 records in the data set, which was sorted by individual Social Security 
Number. 

The data was imported to Microsoft Access for initial analysis and validation. 
Access was first used to search for null fields and bad data. Approximately 12,000 of the 
records had more than one null field; several of these had more than 4 null fields. To 
avoid potential problems with the analysis, the records with more than one null field were 
removed from the data set. There was concern that, in doing so, the data set would be 
compromised, so before assuming the null records were random occurrences, each 
column of the null set was plotted to check for uniformity and conformity with the 
remaining data set. For example, the number of null fields was plotted for each NRD to 
ensure that no single NRD or Area was consistently failing to input the data. Further, 
binomial probability hypothesis tests were used to compare categorical variables. This 
analysis identified several columns (variables) which were not complete; the data was not 
collected for DEP attrites. As a result, many of the variables available for RTC analysis 
were not available for DEP analysis. The variables available for DEP analysis are 


marked in Table 3 with a “*”. 
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Table 3: Data Descriptions 


SSN 
WHITE* = ~—_[Binary (0,1),1 if White 

IBLACK* = ——_—_—=sX[ Binary (0,1), 1 if Black 

HISPANIC* ———_—s[Binary (0,1), 1 if Hispanic 

Binary (0,1), 1 if Asian or Pacific Islander 

NRD* 3 Digit Code Representing Recruiting District of the Individual 


Binary (0,1), 1 if Individual was a Senior in High School upon Enlistment 
Initial Rating- Assigned (String) 

ROGRAM-2* Final Rating- Assigned (String) 

ONUS* Binary (0,1), 1 if Individual Signed up for EB 
NAVY COLLEGE Binary (0,1), 1 if Individual Signed up for Navy College Fund 


ENIOR* 
ROGRAM-1* 


inary (0,1), 1 if Individual did not Graduate From High School 
inary (0,1), 1 if Individual Graduated from High School (NO-GED) 
inary (0,1), 1 if Individual Graduated with GED 
Letter Code Assigned by CNRC to Categorize a DEP Attrite 
Digit Code Representing RTC Attrition Category 
Number of Days Individual was Scheduled for DEP 
umber of Days Actually Spent in DEP 
Number of Dependents 
onth Individual Shipped to RTC 
onth Individual Attrited from either DEP or RTC 
inary (0,1), 1 if a Waiver was Granted for Previous Criminal Behavior 
inary (0,1), 1 if a Waiver was Granted for Previous Drug Use 
inary (0,1), 1 if a Waiver was Granted for a Medical Condition 
inary (0,1), 1 if a Waiver was Granted for Any Other Reason 
inary (0,1),1 if Individual Indicated on SHIP Survey : Smoker 
Binary (0,1), 1 if Individual Indica 
Tobacco 
RUNJOG Binary (0,1), 1 if Individual Indicated on Ship Survey: Ran orJogged at least 3 


Times a Week 
Binary (0,1), 1 if Individual Attrited from DEP 


RTC ATTRITE** Binary (0,1), 1 if Individual Attrited from RTC 
OBCHANGE Binary (0,1), 1 if PROGRAM1=PROGRAM2 
*Indicates variable was available for DEP analysis 
** Indicates dependent variable 
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DEP ATTRITE** 


The search results and analyses of the variables indicated there was no reason to 


believe the null field occurrences were not random events (with respect to their 
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variables). Consequently, the individual’s records (rows) with more than one null field 
were removed from the data set. 

Once the data were removed, the remaining data were randomized. A column of 
random numbers, uniformly distributed between 0 and 1, was added to the set and Access 
sorted the data into two parts (random < .5 and random > .5). These partitions of the data 
set produced two sets containing approximately 60,000 records each. One set was used 
to build the models, the other was saved to test their predictive power. Once partitioned, — 
the model building data were further subdivided to exclude DEP attrites from RTC 
analysis. The final data consisted of two partitioned data sets for DEP and RTC attrition 


analysis. 


B. LOGISTIC (LOGIT) REGRESSION 


Previous research indicated that logistic, or logit, regression is a widely used 
technique for attrition analysis. As with other regression techniques, logit regression 
models a dependent variable by a linear combination of many independent variables. In 
attrition analysis, the dependent variable is categorical (i.e., whether or not a recruit 
attrites) and researchers are interested in the probability a person with a given set of 
characteristics will attrite. Since the outcome is a probability and bounded by zero and 
one, OLS regression is not suitable. Logit regression, however, will result in “predictive 
values which correspond to the probability of a positive (attrition) outcome” (Martin, 


1995). The logistic model is defined by 


Pr [Y;= 11X;] = 1/(1 + exp [-(X%;' B)]) 
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where Yj; is the dependent variable for recruit “i”, DEP or RTC attrition, and Xj; represents 
the vector of independent variables (characteristics) of recruit “i” (male, GED, etc.). B 
represents the vector of unknown regression coefficients for the model. 

Using S-Plus (Mathsoft Inc., 1995), DEP and RTC data were modeled using logit 
regression. The first step was to build a model using all of the potential predictors (Table 
3) and some possible interactions. The interactions are listed in Table 4. 


Table 4: Interactions 
BONUS & GED 


NCF & GED 
SMOKE & RUNJOG 
CHEW & RUNJOG 
CRIME WAIVERS & DRUG WAIVERS 








The interaction “BONUS & GED” was incorporated to look at possible 
motivation levels among GED entrants. “NCF & GED” was also incorporated to look at 
educational motivation among GED entrants. “SMOKE & RUNJOG” and “CHEW & 
RUNJOG” will examine whether the effect of tobacco use is different for runners than for 
non-runners. The waiver interaction is included to see if these two factors interact. 

With all main effects and these interactions included, the full model was 
estimated and then the least significant variables were deleted (one at a time). The 
absolute t-values of the coefficients were computed and the coefficient corresponding to 
the smallest of these was deleted if its t-ratio was insignificant with a = .05. The model 
was rebuilt and the process repeated until all coefficients had t-values which were 


significant with a = .05. The goal was to build a statistically sound model with the 


fewest predictive variables. 
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Step-wise variable removal can produce questionable t-values in the resulting 
model so critical levels were adjusted using the Bonferroni inequality method. Ten 
variables were removed in the RTC model (among them were interactions) and four 
variables were removed in the DEP model. o was adjusted to 0.05/10 = .005 for the RTC 


model and to 0.05/4 = 0.0125 for the DEP model. 


C. CLASSIFICATION TREES 

An alternative to logit regression is to use classification trees to describe the 
structure of the data. (Brieman et. al, 1984) Classification trees are similar to regression 
in that they model a dependent variable by the values of many independent variables. A 
classification tree is one where the dependent variable is categorical. Trees for continuous 
responses are referred to as regression trees. Fitting a tree model is a recursive procedure 
resulting in terminal nodes or “leaves” containing groups of cases with similar values in 
their independent variables and differences in the dependent variables, which reflect 
response probabilities. 

The process begins with a parent node. This node has a “purity measure” with 
respect to the dependent variable. This purity measure is defined by S-Plus as deviance. 


The deviance formula follows: 
Deviance; = -2 * dx (nix * log (pix)) 


where “i” labels the node, “k’”’ labels the classes in the node (here these are “‘attrite” or 


ées59? 
] 


“no attrite”), “ng? represents the number of cases with class “k” in node and “pi,” is 
the multinomial probability associated with node “i” and class “k”. The total deviance of 


the final tree is the sum of the leaf deviances. For each node, S-Plus looks at every 
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variable and every possible binary split within that variable and chooses the variable and 
split that brings about the maximum reduction in deviance at each stage, splitting the 
node into two children nodes. Each pair of child nodes has a combined deviance that is 
no larger than that of their parent. (Venables & Ripley, 1994) 

In the attrition analysis, the initial parent node, or root, will contain all of the 
records in the data set. In the case of binary or categorical independent variables the 
splits are pre-determined by the variable (e.g., male or female in the binary case, WHITE 
or BLACK and ASIAN in the categorical case). In the case of continuous independent 
variables, the possible splits depend upon the data representation. For example, the age 
variable is tracked with a precision in tenths of a year; S-Plus will look at each possible 
split between tenths (e.g., if the data is 22.6 and 22.7 years, S-Plus will analyze the split 
between values, i.e., above 22.65 and below 22.65). When the program has found the 
best split (biggest reduction in deviance) for each variable, it will choose the best split 
across all variables. The procedure is repeated for each child. Figure 7 depicts a 
hypothetical example. 

The tree algorithm often results in over-fitting the data, especially with large data 
sets. To compensate for this, S-Plus provides methods to reduce the size of the tree to an 
optimal predictive size. Cross-validation identifies the optimal-size tree and pruning 
enables the analyst to choose a tree size by selecting the number of terminal leaves. 

Cross-validation repeatedly grows and prunes trees. The data is randomly split 
into ten sets or partitions. A sequence of trees (sizes 2,3,4...etc.) are grown with all but 
one of the data partitions; the remaining partition is used to test the predictive powers of 
the trees; the deviance of each tree is computed for the partition left out. The quality of 
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the tree is then evaluated for a range of possible sizes. This process is repeated for the 


other partitions and the minimum deviance, of the ten partitions, for each size tree can be 


Fema! Male 
emale 18 


ay as 
Child (ist Split) Child (1st Split) 


62 10 


AGE<22.65 : AGE>22.65 


ayy a12 
Child (2nd Split) Child (2nd Split) 


Al 95 





Figure 7. Hypothetical Tree Example 
compared to the model size. The optimal size tree is determined by plotting model size 
versus minimum deviance and finding the minimum of these deviances. 

Interpreting the results of the tree is accomplished by reading the probabilities in 
the terminal leaves. For example, Figure 7 would indicate that 11% of the women under 
22.65 years of age would attrite (hypothetically). Ease of interpretation is a key benefit 
of tree-based models. Using the tree functions within S-Plus, a classification tree model 
was developed for the two attrition cases (DEP and RTC). Cross-validation was used to 


determine the optimal size and the trees were pruned accordingly. 
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D. PREDICTION 


Both the logit model and the classification tree result in probability estimates for 
attrition. To test the predictive power of the models, the data sets were randomly 
partitioned; half of the data was not used in the model development. For testing, these 
remaining partitions were run through the models and if the probability estimate was 
above a pre-determined decision threshold, the individual was scored as an attrite. For 
example, if the fitted value from the logistic regression model was .7 and the threshold 
was .69, the individual with a .7 probability of attrite was predicted to attrite. For the tree 
models fitted probabilities were obtained by using the “predict()” function built into S- 
Plus. The “predict()” function uses the model to derive the probability of both positive 
(attrition) or negative (non-attrition) responses for a given data set. In the attrition 
analysis, each record (row) was fitted with a predicted probability and this piotabiniey 
was compared to derived threshold. A record of predicted attrites was kept and compared 
to the actual data records. 

Correct predictions fell in to two categories: attrites and non-attrites. Correct 
attrite predictions were those where the model first calculated a fitted value (probability); 
if the value was above the optimal threshold and the individual actually attrited, it was 
counted as a correct attrite prediction. Correct non-attrite predictions were those where 
the fitted value was below the threshold and the individual did not attrite. The sum of 
these two types of predictions was recorded. The final result was a number of correct 
predictions for each the model. 


The decision threshold was developed using the fitted values from each model 


and the actual attrition values from the data used to build the models. A simple 
26 

















optimization program was constructed using JAVA 1.1.4. The program read in the actual 
and fitted values for each record in the data set and walked through a preset number (200) 
of possible probability thresholds for the attrite decision. Several step sizes were tried 
and it was determined that a step size of 0.005 provided sufficient accuracy. A count of 
correct predictions was made for each threshold and the probability associated with the 
maximum number of correct predictions was identified for each model. The code for the 
program is listed in Appendix C. Figure 8 shows plots of threshold versus number of 
correct predictions for each model while Table 5 lists the optimal decision thresholds for 


each model. 


DEP Logistic DEP Tree 
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Figure 8. Model Threshold Plots 
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Table 5. Optimal Thresholds 












Optimal Threshold 
DEP Logistic 0.54 


DEP Tree 0.77 
RTC Logistic 0.33 
RTC Tree 0.2 
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Hi. RESULTS 


A. DEP LOGISTIC MODEL 
The logistic model for the Delayed Entry Program is summarized in Table 6. 


Model significance can be assessed by comparing the difference between the null 
deviance and residual deviance with a ¥* with eleven degrees of freedom (the number of 


parameters in the model). (Venables and Ripley, 1994) This approximation shows the 


model would be significant at very high confidence levels (58284.38 - 52397.63 = 
5886.75; this is compared to a ¥7 (11), which has an expected value of 11 and standard 


error of 3.31 ). 


Table 6. DEP Logistic Model Summary 
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SCHEDDEP 0.008 0.00001 39.912 
JOBCHANGE -0.287 0.033 -8.787 


Null Deviance: 58284.38 on 62252 degrees of freedom 
Residual Deviance: 52397.63 on 62241 degrees of freedom 











The factors that significantly (& = .0125) increase the probability of attrition with 
an increase in their value are AGE, two races (WHITE and BLACK), Education Level 


(GED), and Time Scheduled for DEP (SCHEDDEP). The factors that significantly 
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decrease the probability of attrition with an increase in their value are AFQT score, sex 
(MALE), enlisting as a senior in high school (SENIOR), taking an enlistment bonus 
(BONUS ) and changes in future billet assignments (JOBCHANGE). All other variables 
listed in Table 3, and interactions from Table 4, were removed for insignificance. 


B. DEP TREE MODEL 
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Figure 9. DEP Tree Size vs Deviance 
Cross-validation identified an optimal tree with 52 terminal nodes. The large size 
of this tree made it very difficult to interpret as it had over ten levels of splits. As a 
result, the cross validation data (model size and deviance) was analyzed to see if there 


was an alternative size tree with similar deviance and predictive power. Figure 9 shows 
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the relationship between deviance and model size. The deviance (as depicted in Figure 9) 
is almost flat once model size grows above 20 terminal nodes. The actual difference in 
deviance between the tree with 20 nodes and the tree with 52 nodes is 5460 — 5440 = 20. 
The smaller tree, with 20 terminal nodes, was much easier to interpret (since it had only 
six levels). Further analysis indicated that the 20-node tree predicted with the same level 
of accuracy as the 52-node tree. The model selected and analyzed in this paper contained 
20 terminal nodes (the code for all trees, including the 52-node tree, can be found in 
Appendix D). 

The DEP tree model with 20 terminal nodes is depicted in Figure 10. The number 
inside each node is the attrition probability for all of the sacs within the node. The root 
shows an attrition probability of 0.18. Rectangular nodes are terminal nodes and the 
number of cases in the node is listed beneath the node. The first split divides the cases 
into two sets (those with high school degrees and those without high school degrees). If 
the individual has no high school degree, is not a senior upon enlistment, and has an 
AFQT score below 49.5 , he or she has a 0.76 probability of attrition (for scheduled DEP 
durations less than 121.5 days) or a 0.97 probability of attrition (for scheduled DEP 
durations above 121.5 days). If an individual has no high school degree, is a senior upon 
enlistment, but earns a GED or fails to graduate, he or she has an attrition probability of 
0.98 . If an individual has a high school degree, is female, and is scheduled for DEP less 
than 75.5 days, she has a 0.06 attrition probability. Other specific cases can be evaluated 


by using Figure 10. 
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Figure 10. DEP Tree Model 
C. RTC LOGISTIC MODEL 
The RTC logistic model is summarized in Table 7. The ¥* statistic (as discussed 
in section III.A) has 15 degrees of freedom and shows the model’s significance 
(33135.88 - 32320.22 = 815.66; this is compared to a ye (15), which has an expected 
value of 15 and standard error of 3.87). The factors that increase the probability of 
attrition (a& = .005) with an increase in value are AGE, two races (WHITE and BLACK), 


two educational levels (NONGRAD and GED), time scheduled in DEP (SCHEDDEP), 


tobacco use (SMOKE only), waivers (CRIME and OTHER), changes in job, or 
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Table 7. RTC Logistic Model Summary 


Null Deviance: 33135.88 on 47464 degrees of freedom 
Residual Deviance: 32320.22 on 47449 degrees of freedom 


SCHEDDEP -0.001 0.0001 -6.427 





program, classification (JOBCHANGE), and the interaction between SMOKE and 
RUNJOG. The factors which decrease the probability of attrition with an increase in 
their value are AFQT score, time scheduled to be in DEP (SCHEDDEP), RUNJOG, and 
the interaction between CHEW and RUNJOG. Other variables and interactions were 


removed from the model for insignificance. 


D. RTC TREE MODEL 

The RTC tree model is depicted in Figure 11. Cross-validation identified an 
optimal tree with nine terminal nodes and this size was used for the ensuing analysis. 
The root node indicates an overall probability of attrition of 0.11. The first split divides 


the cases into two sets: smokers and non-smokers. Smokers with AFQT scores below 
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Figure 11. RTC Tree Model 
66.5 who were scheduled for DEP for less than 81.5 days have the highest probability of 
attrition (0.20). Non-smokers who run or jog more than three times a week, with AFQT 
scores above 49.5 and a scheduled DEP duration greater than 18.5 days, have the lowest 


probability of attrition (0.065). Other specific cases can be evaluated using Figure 11. 


E. PREDICTION 


The prediction methodology, discussed in section IID, established the optimal 
decision thresholds depicted in Table 5. Using the data which was held out (i.e., not used 
to build the models), the thresholds depicted in Table 5, and the “predict ()” function 
within S-Plus, fitted values for each model were obtained. As discussed earlier, the fitted 
values (probabilities) were compared to the appropriate threshold and scored accordingly. 
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A summary of the prediction results 1s depicted in Table 8. The net gain column 
represents the difference between using the model and not using the model. “Not using 
the model” means that all of the predictions are “no attrite;” the values in that column 
reflects actual attrition. For example, the DEP logistic model predicts 49229 of the 
61947 accessions correctly. If the model is not used, CNRC predicts every individual to 
complete (e.g., all 61947) and is correct for 49158 of the accessions; the remaining 12789 
attrite. For the DEP logistic model, using the model resulted in 71 more correct 
predictions than not using the model. For the DEP tree, the model had 3954 more correct 
predictions. Neither RTC model had any impact on the predictive outcome (i.e., no 
additional correct predictions were made). 

Table 8. Model Prediction Summary 
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Further analysis of the predictions indicated that the improvements realized by 
both of the DEP models reflected correct attrite predictions. While the 71 DEP logistic 
attrite predictions can not be attributed to a specific set of characteristics, the 3954 DEP 
tree attrite predictions fall exclusively in a single node of the tree. This node is defined 
by individuals with no high school degree, who are seniors upon enlistment, but fail to 
graduate (or they get a GED) from high school. This node has an attrition probability of 


0.98. There were a total of 4158 cases in this node. 
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IV. CONCLUSIONS 


A. ADDRESSING THE HYPOTHESES 

It was hypothesized in section IF. that individuals who do smoke and do not 
exercise have a higher probability to attrite from RTC. The RTC logistic model indicates 
that smoking increases the probability of attrition. The RTC logistic model also shows 
that exercise (represented by RUNJOG) decreases the probability of attrition 
(coefficient=-0.34). The RTC logistic model indicates that individuals who smoke and 
do not exercise have a higher propensity to attrite than those who exercise and do not 
smoke. The RTC tree model confirms the increased probability of attrition for smokers, 
but never splits on RUNJOG for smokers. 

The second hypothesis asserted that A-cell individuals have a lower propensity 
to attrite (from both RTC and DEP) than B-cell, C-cell or D-cell individuals. The DEP 
logistic model confirms the inverse relationship between AFQT score and DEP attrition 
as well as the direct relationship between GED and DEP attrition, but it does not find 
NONGRAD to be significant. The results of the DEP model are inconclusive with 
respect to A-cells. The RTC logistic model clearly supports the hypothesis. The DEP 
tree model consistently shows that AFQT and DEP attrition are inversely related and also 
shows that not graduating from high school will result in a higher probability of attrition. 
The RTC tree model illustrates the inverse AFQT relationship but is inconclusive because 
it never splits on educational level. 

The third hypothesis states that individuals who take an incentive package are less 


likely to attrite from DEP or RTC than those who do not take an incentive package. The 
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DEP logistic model supports the hypothesis for both the NCF and EB. The DEP tree 
model splits once on NCF (those with no high school degree, who are not seniors upon 
enlistment, who have a scheduled DEP duration between 121.5 and 225 days, and who 
have AFQT scores above 49.5) and this directly contradicts the hypothesis, but the 
relevant terminal node contains only 17 cases. The RTC logistic model does not find 
either variable significant and the RTC tree model never splits on incentive packages. 


With the exception of the DEP logistic model the results are inconclusive. 


B. ADDRESSING THE RESEARCH QUESTIONS 

1. Other Predictive Factors 

There were several predictive factors identified by the models not addressed in the 
previous section. First is the DEP tree node with the 0.98 DEP attrition probability and 
the exceptional predictive power (the node predicted 3954 attrites correctly). The node 
can be summarized as individuals who enlist as seniors in high school but fail to graduate 
(they may get a GED) Roi high school. 

Research, into this node, indicates that these attrites actually fall into two 
categories: “Fail to Grad” and “Fail to Obligate” (see appendix A). “Fail to Grad” 
categorizes individuals who are disqualified by the Navy for failing to graduate from high 
school. As discussed in section I.C. the Navy has a cap on NON-GRADS of 5% which is 
more stringent than the congressional mandate of 10%. A recruiter will realize that an 
individual is not soing to graduate from high school, in summer, which falls in the fourth 
quarter of the fiscal year. At that time, the quota for NON-GRADS will usually be full 
(or close to full) and these individuals will be lost. 
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The second group, and vast majority of these individuals (98%), are actually 
individuals who "Fail to Obligate" or quit from DEP. When individuals enter the DEP as 
high school seniors, they are given an educational code of "P" (probable graduate). 
When the individual drops out of the DEP before high school graduation (assuming he 
signs up as a senior in high school), the individual's education code is not updated and the 
tree classification does not reflect the actual educational level. There is no predictive 
power in this node. 

The DEP logistic model and the DEP tree model both show that females have a 
higher propensity to attrite from DEP than males, but this is not the case in RTC. 
Variables such as SCHEDDEP and AFQT simply confirm previous research for both 
DEP and RTC attrition. Finally, all of the models indicate that AGE is directly related to 
attrition. 

2. Comparing Methodologies 

Comparing and contrasting the two methodologies is the second research 
question. Given the categorical nature of the data and binary response, both classification 
trees and logistic regression were well-suited for the problem. The models produced 
consistent probabilistic outcomes that center on the actual attrition rates. While the RTC 
models were similar, the DEP tree model did reveal structure within the data, which was 
not discernible from the logit model. While both provide insight into attrition, neither 
method was able to explain the phenomenon fully. The true structure of the phenomenon 
may not be discernible from the current data sets and it is recommended that CNRC and 
CNA continue to collect new data (similar to the SHIP survey) in hopes that models with 
better predictive power can be developed. 
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3. Policy Implications 

Policy decisions related to this type of study often include adopting new screening 
procedures which exclude individuals with certain sets of characteristics. All of the 
models identified variables with statistical significance or structure and these both 
provide insight into attrition probabilities. Without exception, the models predicted 
poorly. Despite strong statistical significance, these results should not be used to predict 
an individual's outcome and possibly exclude him or her from service. These results do, 
however, improve our understanding of group behavior, which can be beneficial in 


aggregate forecasting, simulation and decision modeling. 
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APPENDIX A. DEP ATTRITION CODES 


This appendix contains a table with all of the actual DEP attrition codes and the 


corresponding sub-categories referred to in Figure 4. 


DEP ATTRITION REASON Figure 5 Category 
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RCE 

RGF [Paleo Grdiae——«(ecnialty 
ROK __[Resind Recommendation (Amin 
ROB _[Refusesto Obligate _——=~=~=*alled to OBFgnte@ 
ROC [MemberReachedE4 «(Seen 
ROD [Bali Class Date Not Available (Admin 
ROE [Schedule Precudes Attendance (Admin 
ROF [Desires TAD vice PCS Orders ‘(Admin 
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ROO 
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RCE____ [MEPS Drug Positive 
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RGE 
RGF 
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ROC 
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APPENDIX B. RTC ATTRITION CODES 


This appendix contains a table with all of the actual RTC attrition codes and the 


corresponding sub-categories referred to in Figure 5. 


Academic/ *In Service (1)/} Figure 6 
Attrition Code | Non-Academic | RTC ATTRITION REASON  |*FrortoSve(P)) Category 


| P| Medical 
pT | Medical | 
| P| Medical 
| P| Medical 
| Non-Academic | __Meducal-Urology | P| Medical 
| Non-Academic | Medical-Opthalmology | P| Medical _ 
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Academic/ *In Service 2) /) Figure 6 
Attrition Code | Non-Academic] RTC ATTRITION REASON an . Sve | Category 
Ae cknone Psych-Suicidal Situational ae | 
Reaction 
a seco Medical-Not Aquatically —— 
Adaptable 


Legal-Civilian Conviction | Legal-Civilian Conviction | I | Legal 
a een or 
[202 | Non-Academic] Legal Breach of Contract | 1 | Legal _ 
(208 |Nor-Academic __Legal-Misconduct_ | __1 | Legal 
[205 [Non-Academic] _Legal-Homosenual | 1 | Legal 
[206 [Nor-Academic__Legal-Drogs | ‘1 +| ‘Lege 
[207 [Non-Academic] Non-Traning Related Death |__| Techncality 
[215__[Nor-Academic| Emoneous Enlistment | P| Admin 
[216 [Non-Academic] Erroneous EnlistmentBest_| P| Admin 
[217 [Non-Academic| Erroneous Enlistment NavAF | P| Admin 


Motivation 
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220 | Non-Academic] Drug Sereen-Non-CNBS_| P| Soreen_ | 
[221 |Non-Academic|_Drug Sereen-CNBS__[ P| Screen 
(222 _|Non-Academic| Drug Screen | P| Screen 
[223 [Non-Academic ___Homosewual | P| Screen 

724 [Non-Academic ___ArrestRecord | P| Screen 
"226 [Non-Academic | Undisclosed Military Service | P| Screen 
[311___[Non-Academic] Other] ~SPSS=«d;~SC« cree 
[367 [Non-Academic] Medical-Other__| P| Medical 
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APPENDIX C. JAVA SOURCE CODE 


This appendix contains the JAVA 1.1.4 source code for the optimal threshold 


model. The model was run using step sizes of .0001, .0005, .001 and .005. There was no 


difference in threshold value and performance using the larger step size. The code 


follows, comments are preceded by "//"and are in bold print: 


// java import classes for input and output methods 


import JjJava.util.*; 
import java.io.*; 


public class FindBest{ 


//instance 


//array to 
private 


//array to 
private 


/farray to 
//step 


private 


/f/array to 


variable and array declarations 


store predicted (fitted) values 
double[] pred; 


store actual attrite (0/1) 
double[] actual; 


store the number of correct predictions for each 
double[] corrPred; 


store the corresponding threshold for the number 


of //correct predictions in corrpred 


private 


private 


double[] probs; 


int observations; 


// counting variables 


private 
private 
private 
private 
private 


double difference; 
double countActual; 
double countCorrect; 
double countWrong; 
double bigCount; 


//alias probability variable 


private 


double cutProb; 
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//input object variables to read files 


private BufferedReader inStream; 
private String fileName; 


//constructor 
public FindBest (String file, int obs,double step) { 


// variable initialization 


fileName=file; 

bigCount=0; 

observations=obs; 

pred=new double[observations]; 
actual=new double[observations]; 
corrPred=new double[200]; 
probs=new double[200]; 


//since fitted and actual values are bounded by zero and 
//one the arrays are set to 2 to trigger errors 
for (int i=0;i<obs;itt) { 
pred[1]=2; 
actual [i]=2; 
} 
countActual=0; 
countCorrect=0; 
countWrong=0; 
cutProb=0; 
difference=0; 
int j3=0; 
inStream=null; 


// this reads the file and fills up the arrays 
try{ 
inStream=new BufferedReader (new FileReader (file) ); 
} 
catch (IOException e) { 
e.printStackTrace(); 


} 
try{ 
String linel=inStream.readLine(); 
for(String line=inStream.readLine();line!=null; 
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// 
// 
// 
// 








line=inStream.readLine()) { 
plogCount++; 
StringTokenizer s=new StringTokenizer (line) ; 
while (s.hasMoreTokens()) { 
pred[j]=Double.valueOf (s.nextToken({)).doubleValue 
(); 
actual[j}]=Double.valueOf(s.nextToken()). 
doubleValue(); 
} 
Ak; 


} 


} 
catch (NumberFormatException e) { 


j 
catch (IOException e) { 


e.printStackTrace(); 
} 
} 


the following method steps through the arrays and does a 
correct prediction count for the given threshold, num is 
the index for the threshold, the threshold and count are 
then stored in a different array. 


private void doCount (double threshold, int num) { 
double d2=0; 
double sum=0; 
double countRight=0; 
double countTot=0; 
double setProb=0; 
double countWrong=0; 
double countStay=0; 
double countStayWrong=0; 
double countStayTot=0; 
for(int 1=0;1i<observations—-5;i++) { 


if ((pred[i]>threshold) && (actual[i]==1)) { 
countRight++; 

} 

if (actual [i])==1) { 

COUNT LOLS +? 

} 

if ((pred[i]>threshold) && (actual[{i]==0)){ 
countWrongtt+; 

} 

if ((pred[i]<=threshold) && (actual [i]==0) ) { 
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countStaytt; 
} é 
if ((pred[i]<=threshold) && (actual [i]==1)) { 

countStaywrongtt; 
} 
if (actual [i]==0) { 

countStayTottt; 
} 


difference=countRight-countWrong; 
d2=countStay-countStaywrong; 
sum=countRightt+countstay; 

corrPred[num]=sum; 

probs [num]=threshold,; 

System.out.printin ("Sum/Prob/Tot=, "+sum+ 

" "t+threshold+", "+countTott+", "+countStayTot+", "+fileNa 


me )i 
} | 
// the following method steps through the array of correct 
// predictions and finds the threshold with the highest 
// number of correct predictions and prints it out 


private void findOptimum(double s) { 
double temp=0; 
int count=0; 
for (ink: 1=031< (200-1) 71st) 
if (corrPred[i]>temp) { 
temp=corrPred[i]; 
count=1; 
} 
} 
System.out.printlin(fileName+", "+"Correct Predictions = 
"+temp+" , "+" Threshold = "+probs{count]+", "t+ 
bigCount) + 
} 


// the main method implements the program 
public static void main(String[]Jargs) { 


//files to be read 
String filel="G:/depcut.txt"; 
String file2="G:/dtreecut.txt"; 
String files="Giyrtceuc. exe”; 
String file4="G:/rtreecut.txt"; 
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ee 


es 


// 


// 





the number of records in the appropriate data sets 


int ob1=61947; 
int ob2=61947; 
int ob3=47465; 
int ob4=47465; 


step size 


double step=.005; 


create an object for each model 


FindBest depLogCut=new FindBest (filel,obl, step) 
FindBest depTreCut=new FindBest (file2,ob2,step) ; 
FindBest rtcLogCut=new FindBest (file3,ob3, step) 
FindBest rtcTreCut=new FindBest (file4,ob4,step) ; 


tf 


f 


loop through the steps from zero to one for each model 


int j=0; 
for(double i=0;i<l;it=step) { 


depLogCut.doCount (i,j); 
J++; 

} 

1=07 

for (double 1=0;1i<1;1i+=step) { 
depTreCut.doCount (i,j); 
a a 

} 

J=0; 

for (double i=0;1<1;it=step) { 
rUChogCur,cocount (1, 374 
Beer 

} 

J=07 

for (double i=0;1<1l;1i+=step) { 
recTreCue.coCount (1,9) 2 
has a7 


} 


// find the best threshold for each model 


depLogCut. findOptimum (step) ; 
depTreCut.findOptimum (step) ; 
rtcLogCut.findOptimum (step); 
rtcTreCut. findOptimum(step) ; 
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APPENDIX D. TREE MODEL OUTPUT 


This appendix contains the actual S-Plus version 3.3 tree outputs for the DEP 
trees with 20 and 52 terminal nodes and the RTC tree with nine terminal nodes. Each 
row contains the node split, the number of cases in the node, the deviance at the node, 
and the “probability of attrite” at that node. A “*” denotes a terminal node. 

1. DEP TREE, 20 TERMINAL NODES 


split, n, deviance, yval 
* denotes terminal node 


Ly BOO. 62253 -9103 5000. 0.1 7790 
2) HSDG<0.5 7471 1752.000 0.62440 
4) SENIOR<0.5 3166 474.400 0.18350 
8) SCHEDDEP<121.5 2390 198.900 0.09163 
16) AFOT<49.5: 17 32059 Ui76470: * 
17) AFQT>49.5 2373 188.100 0.08681 * 
9) SCHEDDEP>121.5 776 193.100 0.46650 
18) AFQT<49.5 112 L564: 0290210. % 
£9) AFOT>49.5 664 156.400 0%5:7950 
38) SCHEDDEP<225 549 118.500 0.31510 
76) -NCF<0.5-532. 110.300 0.29320.% 
Fl} INCKSO 2D 17 0.000 1.00000 * 
39) SCHEDDEPS>225 215 240130 02668 700°.> 
5) SENIOR>0.5 4305 209.700 0.94870 
10) NONGRADK<0.5 4158 89.060 0,97910. % 
11) NONGRAD>0.5 147 11.850 0.08844 * 
3) HSDG>O..3. 547/82. 5658.000 0.21700 
6) MALE<0.5 9623 1634.000 0.21680 
12) SCHEDDEP<142.5 3862 354.600 0.10230 
24) SCHEDDEP<75.5 2251 127.800 0.06042 * 
25) SCHEDDEP>75.5 1611 217.400 0.16080 * 
13) SSCHEDDEP>142.5.5761 1195000 0.29350 
26) SENIOR<0.5 3468 841.600 0.41440 
52) SCHEDDEP<264.5 2325 531.700 0.35400 
104) WHITE<O.5 2001 223.900 0.30250 * 
105) WHITE>O0.5- 1264 -302.6000 0.39720 = 
53). SCHEDDEP>264.5 1143: 264.200 0.53720 * 
27) SBENIOR>0.5°:2293. 225.900: 0.11080 ~* 
7) MALE>O.5 45159 3908.000 0.09568 
14) SENIOR<0.5 31324 3266.000 0.11820 
238) SCHEDDEP<1 74,5 27171-2267 .:000° 0209136 
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56) SCHEDDEP<63.5 15064 811.800 0.05716 * 

57) SCHEDDEP>63.5 12107 1414.000 0.13500 
114) SCHEDDEP<102.5 4964 471.800 0.10640 * 
115) SCHEDDEP>102.5 7143 935.400 0.15500 * 

29) SCHEDDEP>174.5 4153 856.600 0.29090 
58) SCHEDDEP<277.5 3461 648.300 0.24960 * 
59) SCHEDDEP>277.5 692 173.000 0.49710 * 
15) SENIOR>0.5 13835 589.500 0.04460 * 


2. DEP TREE, 52 TERMINAL NODES 


1) POGOe 62253 9103000 0217790 
2) HSDG<0.5 7471 1752.000 0.62440 
4) SENIOR<0.5 3166 474.400 0.18350 
8) SCHEDDEP<121.5 2390 198.900 0.09163 
16) AFQT<49.5 17 3.059 0.76470: '* 
17) BYOTS4S. 5 2373. 136.100 0.08¢6c1 
34) SCHEDDEP<52.5 1865 114.900 0.06595 * 
35) SCHEDDEP>5S2.5 00S 69.440 0.16340 * 
9) SCHEDDEP>121.5 776 193.100 0.46650 
18) AFOT<49.5 112 1.964 0.98210: * 
19) AFOT>49.5 664 156.400 0.37950 
38) SCHEDDEP<225.'549 116.500. 0.31510 
76) NCF<0.5 532. 1105300 0.29320 
152) GED<0.5 238 S20 U3 1a 0U: & 
153) GED>O.5 294 Sie 7 50.50.2270 90" 
Tt). NCEFO.0: 27 O03 000' 1.00000; * 
39) SCHEDDEP>225 115 24.730 0.68700 * 
5) SENIOR>0.5 4305 209.700 0.94870 
10) NONGRAD<0.5 4158 65,100: 0.97910 = 
11) NONGRAD>O.5 147 Leeoo0 “0,080as. * 
3) HSDG>O.5 54782 5658.000 0.11700 
6) MALE<0.5 9623 1634.000 0.21680 
12) SCHEDDEP<142.5 3862 354.600 0.10230 
24) SCHEDDEP<75.5 2251 127.800 0.06042 * 
25) SCHEDDEP>75.5 1611 217.400 0.16080 
50) SENIOR<0.5 1414 206.400 0.17750 
100) JOBCHANGE<0.5 1127 180.100 0.19960 
200) WHITE<0.5 338 TS <6L0. 0.16360: * 
201) WHETE>0,5 589. 105.100 0.23260 
402) SCHEDDEP<101.5 206 ZT ett OLLO0Z0 = 
A03) SCHEDDEPFLOL.5 333 Ton d OO: OaZ2)190. * 
101) JOBCHANGE>0O.5 287 232640 0.09059: * 
51) SENIORS O25. 19/ T261> 0202061. * 
13) SCHEDDEP>142.5 5761 1195.000 0.29350 
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26) SENIOR<0O.5 3468 841.600 0.41440 
92) SCHEDDEP<264.5 2325 S31./00 0.35400 
104) WHITE<O0.5 1001 ‘223..900.-0..30250 
208) JOBCHANGE<0.5 910 199.400 0.32420 
416) SCHEDDEP<212 548 110.700 0.28100 * 
417) SCHEDDEP>212 362 66.0680 0.36950 * 
209) JOBCHANGE>0.5 151 212520 0.17220. 
105) WHITE>0.5 1264 302.600 0.39720 
210) JOBCHANGE<0.5 1066 258.600 0.41370 
420) SCHEDDEP<148.5 5/7 LL. SU: 0426820 > 
421) SCHEDDEP>148.5 1009 246.100 0.42220 
842) AGE<18.85 290 662700: Us 55660: > 
843) AGE>18.85 719 177.800 0.44780 
E666) AFOLT<S)t.0 520 129.600 0.47510 
33/2) SCHEDDEP<251.5 476. 116,900 0.48950 
6744) SCHEDDEP<223.5 350 SO040/0 0.45140: = 
6745) SCHEDDEP>223.5 126 30560 “Osa 9OZ20: * 
33/35) SCHEDDEPS2Z51,.5° 44 9e202- OeZ32990, > 
1687) AFOQT>77.5 199 46.9 )0 0.50100) > 
Z11) JOBCHANGE>0.5 198 4222.0 ‘0230010: * 
23) SCHEDDEP>264.5 11435 264.200 0.53720 
106) SCHEDDEP<318.5 615 153.600 0.48460 * 
LO7): SCHEDDEPSSi¢.5: 529 126,900 0.59650: 
27) SENIOR>0.5 2293 225.900 0.11080 * 
7) MALE>0.5 45159 3908.000 0.09568 
14) SENIOR<0.5 31324 3266.000 0.11820 
20) SCHEDDEP<1/4.50:) Z7JL/2 2267..000 0.09186 
96) SCHEDDEP<63.5 15064 811.800 0.05716 
112) SCHEDDEP<6.5 1454 32.290 0.02270 % 
113) SCHEDDEP>6.5 13610 777.600 0.06084 
226) AGE<23.15 11287 596.600 0.05599 * 
2271) AGE>23.15 2323 179.500 0.08437 * 
9/7) SCHEDDEP>63.5 12107 1414.000 0.13500 
114) SCHEDDEP<102.5 4964 471.800 0.10640 * 
115) SCHEDDEP>102.5 7143 935.400 0.15500 
230) AGE<18.35 904 £26430. '0,09:735. * 
231) AGE>18.35 6239 852.600 0.16330 
4602) AGE<25.25 5771 767.200 0.15790 
224) HisPs0.5.4902 6/76.200.-0.16590 
1848) AFQT<35.5 512 49.070: 0;,10940-% 
1849) AFOT>35.5 4390 626.500 0.17240 * 
9259) HISP>0.5 869 S6.950. 0.41260 «* 
463) AGE>25.25 468 33.080 0.23080 * 
29) SCHEDDEP>174.5 4153 856.600 0.29090 
98) SCHEDDEP<277.5 3461 648.300 0.24960 
116) BLACK<0.5 2977 533.800 0.23410 
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232) AGE<22.65 2436 412.400 0.21590 
464) SCHEDDEP<224.5 1756 272.300 0.19190 
928) WHITE<0.5 609 75.990 0.14610 * 
929) WHITE>O0.5 1147 194.400 0.21620 * 
465) SCHEDDEP>224.5 680 136.500 0.27790 
630) AFOT<85.5 578- 1232600°:0.30970. * 
931) APOTS85.5 102 9,020: 0.09804 * 
233) AGE>22.65 541 117.000 0.31610 
466) SCHEDDEP<264.5 520 110.000 0.30380 * 
467) SCHEDDEP>264.5 21 £2992 “O02 61900 > 
117) BLACK>0O.5 484 109.400 0.34500 
234) AGE<18.15 36 S006 0.2 LLLO: > 
235) AGE>18.15 448 103.700 0.36380 * 
59) SCHEDDEP>277.5 692 173.000 0.49710 
118) SCHEDDEP<312.5 303 Toet TO 0241910: s 
119) SCHEDDEP>312.5 389 952950: 0255780 
238) AGE<23.8 343 85.490 0.52770 * 
239) AGE>23.8 46 12620 Ost3Z260) * 
15) SENIOR>0.5 13835 589.500 0.04460 
30) SCHEDDEP<362.5 13673 562.700 0.04300 
60) SCHEDDEP<169.5 3499 #24120 0202510: * 
61) SCHEDDEP>169.5 10174 481.700 0.04983 * 
31) SCHEDDEPS3602.9 1:62 232010) “0.1 7300. > 


3. RTC TREE, 9 TERMINAL NODES 


1) root 47465 4692.0 0.11120 
2) SMOKE<0.5 34069 2918.0 0.09460 
4) RUNJOG<0.5 15782 1571.0 0.11220 
8) AGE<18.55 5802 495.4 0.09428 * 
9) AGE>18.55 9980 1073.0 0.12250 * 
5) RUNJOG>O.5 18287 1338.0 0.07946 
10) SCHEDDEP<18.5 2154 226.3 0.11930 * 
11) SCHEDDEP>18.5 16133 1107.0 0.07413 
22) AFOT<49.5 5403 450.5 0.09180 * 
23) AFOT>49.5 10730 654.3 0.06524 * 
3) SMOKE>0.5 13396 1740.0 0.15350 
6) AFOT<66.5 8356 1198.0 0.17350 
12) SCHEDDEP<81.5 4037 635.4 0.19570 * 
13). sSCHEDDEP>S1..5 4319 559.1 Uslozo0 = 
7) AFOT>66.5 5040 533.1 0.12020 
14) SCHEDDEP<152.5 35638. 413.2 013370 = 
15) -SCHEDDEP>152.5 1472 217.7 0.08764. * 
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